The Grammar of Graphics

ANU BDSI
workshop
Data Visualisation with R Part 1

Emi Tanaka

Biological Data Science Institute

10th April 2024

Welcome 👋

Teaching team

  • Academic statistician passionate about data science and open source software
  • Currently, Deputy Director of ANU Biological Data Science Institute and Executive Editor of R Journal
  • PhD in Statistical Bioinformatics
  • BSci (Adv Maths) with major in Mathematics and Statistics
  • Loves data and coding

        https://emitanaka.org     @statsgen     fosstodon.org/@emitanaka

  • 2nd year of PhD student at ANU, working on phylogenomics methods to study hybridisation
  • BSci in Bioinformatics
  • Enjoy hiking and science
  • jeremiasivan
  • (Almost finishing) PhD student @Moritz Lab, E&E Division, RSB, ANU
  • Working on historical biogeography of Indo-Australian birds
  • BSc (Hons) with major in Zoology and Ecology & Conservation
  • Loves games, art markets, and FOOD
  • @goaudreymp
  • https://linktr.ee/goaudreymp
  • 3rd Year PhD student @ Linde Lab, E&E Division, RSB, ANU
  • Investigating the evolution of the Australian orchid flora and associated funga
  • MScSt (Biodiversity Science)
  • Loves playing music, reading, nature
  • @rpodonnell
  • rpodonnell.github.io
  • 2nd Year PhD student @ Sequeira Lab, E&E Division, RSB, ANU
  • Identifying social, collective, or coordinated movement behaviour patterns in sharks using tracking data
  • MSc (Marine Biology), BSc (Biology)
  • Loves triathlon, hiking and the ocean
  • @nilskreuter
  • https://linktr.ee/nilskreuter

Workshop materials

All materials will be hosted at
https://anu-bdsi.github.io/workshop-data-vis-R1/

🕙 Schedule

Time Content
10:00–10:30 Drawing plots with ggplot2
10:30–11:00 Exercise 1
11:00–11:30 The Grammar of Graphics
11:30–12:00 Exercise 2
12:00–12:10 Break
12:10–12:30 Drawing multiple layers with ggplot2
12:30–12:50 Exercise 3
12:50–13:00 Wrap up

Today’s learning objectives

  • Create basic plots using ggplot2
  • Understand the concept of the grammar of graphics
  • Construct plots with multiple layers in ggplot2
  • Adjust scales and guides within ggplot2

Current learning objective

  • -Create basic plots using ggplot2
  • Understand the concept of the grammar of graphics
  • -Construct plots with multiple layers in ggplot2
  • -Adjust scales and guides within ggplot2

Catalogue of plot types

Plotting

Plotting more than one plot

Plotting layer

Plotting small multiples

Plotting from a list of programs

  • One function One complete plot type
  • The number of plots that can be drawn
    The number of plot functions

The Grammar of Graphics

  • In linguistics, we combine finite number of words to construct vast number of sentences under a shared understanding of the grammar.
  • Wilkinson (2005) introduced “the grammar of graphics” as a paradigm to describe plots by combining a finite number of components.
  • Wickham (2010) interpreted the grammar of graphics into the ggplot2 R package (as part of his PhD project)
  • ggplot2 is widely used in the scientific literature and even in newspapers!
  • The grammar of graphics paradigm is also interpreted in other programming languages such as Python (e.g., plotnine), Julia (e.g., VegaLite.jl), and JavaScript (e.g., Vega).

Deconstructing histogram

Deconstructing histogram

Deconstructing barplot

Deconstructing barplot

Visualising amounts and proportions

BARPLOT SCATTER PLOT GROUPED BARPLOT STACKED BARPLOT HEATMAP

PIE CHART STACKED PERCENTAGE BARPLOT STACKED DENSITY PLOT

  • geom_bar()
  • geom_col()
  • geom_point()
  • geom_tile()
  • geom_density()

Position adjustments

A barplot with geom_bar() with a categorical variable

  • If you have a categorical variable, then you usually want to study the frequency of its categories.
  • Here the stat = "count" is computing the frequencies for each category for you.

A barplot with geom_bar() with a discrete numerical variable

  • Supplying a numerical variable to the x-axis results in a continuous scale.
  • If you want to study each level in a discrete variable, then you may want to convert the discrete variable to a factor instead x = factor(year).

Summary data

  • Sometimes your input data may already contain pre-computed counts.

A barplot with geom_col()

  • In this case, you don’t need stat = "count" to do the counting for you and use geom_col() instead.
  • This is essential a short hand for geom_bar(stat = "identity") where stat = "identity" means that you will take the value as supplied without any statistical transformation.

A stacked barplot with "stack"

A grouped barplot with "dodge"

  • "dodge" = position_dodge()

Another grouped barplot with "dodge2"

  • "dodge2" uses a different algorithm to recalculate the x-values with an option to add a padding between geometric objects

Stacked percentage barplot with "fill"

  • If you want to compare the percentages between the different x, then position = "fill" can be handy.

Summary

Exercise time

30:00